Automatic video keyword generation is one of the key ingredients in reducingthe burden of security officers in analyzing surveillance videos. Keywords orattributes are generally chosen manually based on expert knowledge ofsurveillance. Most existing works primarily aim at either supervised learningapproaches relying on extensive manual labelling or hierarchical probabilisticmodels that assume the features are extracted using the bag-of-words approach;thus limiting the utilization of the other features. To address this, we turnour attention to automatic attribute discovery approaches. However, it is notclear which automatic discovery approach can discover the most meaningfulattributes. Furthermore, little research has been done on how to compare andchoose the best automatic attribute discovery methods. In this paper, wepropose a novel approach, based on the shared structure exhibited amongstmeaningful attributes, that enables us to compare between different automaticattribute discovery approaches.We then validate our approach by comparingvarious attribute discovery methods such as PiCoDeS on two attribute datasets.The evaluation shows that our approach is able to select the automaticdiscovery approach that discovers the most meaningful attributes. We thenemploy the best discovery approach to generate keywords for videos recordedfrom a surveillance system. This work shows it is possible to massively reducethe amount of manual work in generating video keywords without limitingourselves to a particular video feature descriptor.
展开▼